sc_pipeline

2023-12-20

HemaScopeR scRNA-seq data


Input data

2 samples are read and processed, which are:

Input data path
D:/data/bm_st/data_all/ST10_DHL/outs
D:/data/bm_st/data_all/ST5_DHL/outs

Quality control

Data filtering

The number of features, the number of counts and the percent of mitochondrial genes in 2 samples are shown below.

The following arguments are used to filter data.

Arguments value      Meaning     
min.cells 10 Include features detected in at least this many cells
min.feature 200 Include cells where at least this many features are detected
percent.mt.limit 20 Include cells where at most this percent of mitochondrial genes are detected

Normalization, dimensionality reduction and clustering

Normalization, dimensionality reduction and clustering are performed according to the standard process of Seurat. The results of clustering are shown bellow.

The following arguments are used in the step above.

Arguments value      Meaning     
scale.factor 10000 Sets the scale factor for cell-level normalization
vars.to.regress Variables to regress out
ndims 50 Total Number of PCs to compute
PCs 1:35 Which dimensions of PCs to use in FindNeighbors, RunTSNE and RunUMAP functions of Seurat
n.neighbors 50 The number of neighboring points used in RunUMAP and FindNeighbors functions
resolution 0.4 The argument resolution of FindClusters function in Seurat

Cell type identification

We used abcCellmap to annotate the cell types. Scmap and Seurat are used to achieve the prediction. The predicted cell types are shown bellow.

In addition, mapping data to a reference dataset can identify shared cell states that are present across different datasets. We provided a reference dataset containing 1354 cells with 10 labels. FindTransferAnchors function are used to integrate the query data and the reference data. The predicted labels of the query data shown below are determined by TransferData function.

The relevant argument settings are as follows.

Arguments value      Meaning     
PCs 1:35 The argument dims of FindTransferAnchors and TransferData functions in Seurat

Visualization

To facilitate data visualization, we used phateR, umap and tsne for dimensionality reduction. The visualizations are shown bellow.

phateR
phateR
tSNE
tSNE
UMAP
UMAP

Differentially expressed genes

Differentially expressed genes are identified using FindAllMarkers function. All markers found are stored in ../Step6.Find_DEGs/sc_object.markerGenes.csv. The first 5 markers of each cluster are shown below.

The relevant argument settings are as follows.

Arguments value      Meaning     
min.pct 0.25 The argument min.pct of FindAllMarkers function in Seurat
logfc.threshold 0.25 The argument logfc.threshold of FindAllMarkers function in Seurat

Cell cycle

The cell cycle phases of each cell are classified using cyclone function. The scores of G1 and G2M phases are defined as the average expression of cell cycle genes from Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma. The proportion of cell cycle phases in each cluster is shown as follows.

The distribution of G1G2 score is also shown as follows.

Heterogeneity

Cell heterogeneity within each cluster is reflected by the distribution of the Spearman correlation coefficients of gene expression of every pair of cells. The boxplot below shows the distribution of Spearman correlation coefficients for cluster.

Marker genes

The marker genes of following lineages are visualized using violin diagrams.

Lineage Marker genes Source
Hematopoietic stem cells Slamf1, Itga2b, Kit, Ly6a, Bmi1, Gata2, Hlf, Meis1, Mpl, Mcl1, Gfi1, Gfi1b, Hoxb5 Single-cell characterization of haematopoietic progenitors and their trajectories in homeostasis and perturbed haematopoiesis
Multipotent progenitor Mki67, Mpo, Elane, Ctsg, Calr The bone marrow microenvironment at single-cell resolution
Erythroid lineage Klf1, Gata1, Mpl, Epor, Vwf, Zfpm1, Fhl1, Adgrg2, Cavin2, Gypa, Tfrc, Hbb-bs, Hbb-y Single-cell characterization of haematopoietic progenitors and their trajectories in homeostasis and perturbed haematopoiesis
Lymphoid lineage Tcf3, Ikzf1, Notch1, Flt3, Dntt, Btg2, Tcf7, Rag1, Ptprc, Ly6a, Blnk Single-cell characterization of haematopoietic progenitors and their trajectories in homeostasis and perturbed haematopoiesis
Myeloid lineage Gfi1, Spi1, Mpo, Csf2rb, Csf1r, Gfi1b, Hk3, Csf2ra, Csf3r, Sp1, Fcgr3 Single-cell characterization of haematopoietic progenitors and their trajectories in homeostasis and perturbed haematopoiesis

Lineage scores

We calculate lineage scores for specified gene sets based on the provided expression data. Four lineages (HSPC, myeloid, B cell, T/NK) are considered. The signatures of these four lineages are shown as follows.

Lineage Marker genes
HSPC lineage CD34, KIT, AVP, FLT3, MME, CD7, CD38, CSF1R, FCGR1A, MPO, ELANE, IL3RA
Myeloid lineage LYZ, CD36, MPO, FCGR1A, CD4, CD14, CD300E, ITGAX, FCGR3A, FLT3, AXL, SIGLEC6, CLEC4C, IRF4, LILRA4, IL3RA, IRF8, IRF7, XCR1, CD1C, THBD, MRC1, CD34, KIT, ITGA2B, PF4, CD9, ENG, KLF, TFRC
B cell lineage CD79A, IGLL1, RAG1, RAG2, VPREB1, MME, IL7R, DNTT, MKI67, PCNA, TCL1A, MS4A1, IGHD, CD27, IGHG3
T NK cell lineage CD3D, CD3E, CD8A, CCR7, IL7R, SELL, KLRG1, CD27, GNLY, NKG7, PDCD1, TNFRSF9, LAG3, CD160, CD4, CD40LG, IL2RA, FOXP3, DUSP4, IL2RB, KLRF1, FCGR3A, NCAM1, XCL1, MKI67, PCNA, KLRF

The generated heatmaps of lineage scores and gene expression patterns are as follows.

Heatmap of lineage scores
Gene expression patterns

The corresponding results are stored in lineage_signatures_scores.csv

GSVA

Gene Set Variation Analysis (GSVA) evaluates the enrichment of gene sets in each cluster to infer the function of them. GSVA package is used to perform GSVA on the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway gene set. The pathways with an adjusted P-value less than 0.05 are shown in the heatmap.

Trajectory analysis

Trajectory analysis can help infer the differentiation process between hematopoietic cells at the single-cell level. In order to obtain reliable results, three methods, monocle2, slingshot and scVelo, are used for trajectory analysis.

The data is analyzed based on the monocle2 tutorial. DDRTree algorithm is used for dimensionality reduction. Clustering results of Seurat and states obtained from the orderCell function are presented along the minimum spanning tree as follows.

Cell type trajectory State trajectory

The data is also analyzed based on the slingshot tutorial. The smooth curves modeling development along various lineages are shown in the first two dimensions of principal component space.

The data is then analyzed based on the scVelo tutorial. The scatter plot and stream plot of inferred velocities are shown bellow.

Grid plot Stream plot

Transcription factor analysis

Transcription factors (TFs) regulate the amount of messenger RNA (mRNA) produced by the gene. TF analysis is performed using SCENIC. The output files and reports of SCENIC are located in ../Step13.TF_analysis/int. The inferred expressions of TFs in each cluster are shown in the heatmap.

Cell-cell interaction

Signal crosstalk between cells is crucial for cellular state and behavior. CellChat is used to infer and analyze the cell-cell communication based on the tutorial. The visualization of each cell-cell communication network is under ../Step15.Cell_cell_interection. The circle plot of overall interaction information is shown as follows.

The interaction weights or strength The interaction number

The incoming and outgoing signaling patterns are also shown as follows.

Incoming signaling patterns outgoing signaling patterns

Outputs

The outputs includes:

  1. Html report : SC_base.html.
  2. Markdown report : SC_base.md.
  3. Files : sc_test.